Probabilistic Exploration in Planning while Learning
نویسنده
چکیده
Sequential decision tasks with incomplete infor mation are characterized by the exploration prob lem; namely the trade-off between further exploration for learning more about the environ ment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learn ing and Q-learning in particular. The existing exploration strategies for Q-learning are of a heu ristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimenta tion should be sufficient for selecting with statis tical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbi trarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy per forms better than a typical exploration strategy. continuous flow of events in time. Effective decision-mak ing requires resolution of uncertainty as early as possible. The . te?dency to minimize losses resulting from wrong predictions of future events necessitates the division of the problem solution into steps. A decision at each step must make use of the information from the evolution of the events experienced thus far, but that evolution, in fact, depends on the type of decision made at each step.
منابع مشابه
Probabilistic Exploration in Planning while Learning
In decision-theoretic planning Pemberton and Korf (1994) have proposed separate heuristic functions for exploration and decision-making in incremental real-time search algorithms. Draper et al. (1994) have developed a probabilistic planning algorithm that performs both information-producing actions and contingent planning actions. Our exploration strategy could be applied to these planning task...
متن کاملFF + FPG: Guiding a Policy-Gradient Planner
The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG’s weakness is potentially long learning time...
متن کاملExploration of Arak Medical Students’ Experiences on Effective Factors in Active Learning: A Qualitative Research
Introduction:: Medical students should use active learning to improve their daily duties and medical services. The goal of this study is exploring medical students’ experiences on effective factors in active learning. Methods: This qualitative study was conducted through content Analysis method in Arak University of Medical Sciences. Data were collected via interviews. The study started with p...
متن کاملOn-line Learning of Macro Planning Operators using Probabilistic Estimations of Cause-Effects
In this work we propose an on-line learning method for learning action rules for planning. The system uses a probabilistic approach of a constructive induction method that combines a beam search with an example-based search over candidate rules to find those that more concisely describe the world dynamics. The approach permits a rapid integration of the knowledge acquired from experience. Explo...
متن کاملAction Schema Networks: Generalised Policies with Deep Learning
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the net...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995